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Most of the available methods for longitudinal data analysis are 
designed and validated for the situation where the number of sub- 
jects is large and the number of observations per subject is relatively 
small. Motivated by the Naturalistic Teenage Driving Study (NTDS), 

Q, ' which represents the exact opposite situation, we examine standard 

.^^ I and propose new methodology for marginal analysis of longitudinal 

. . count data in a small number of very long sequences. We consider 

"ti ' standard methods based on generalized estimating equations, under 

working independence or an appropriate correlation structure, and 
find them unsatisfactory for dealing with time-dependent covariates 
when the counts are low. For this situation, we explore a within- 
cluster resampling (WCR) approach that involves repeated analyses 
^ ■ of random subsamples with a final analysis that synthesizes results 

OO ' across subsamples. This leads to a novel WCR method which operates 

y^ , on separated blocks within subjects and which performs better than 

all of the previously considered methods. The methods are applied to 
the NTDS data and evaluated in simulation experiments mimicking 

Cn '. the NTDS. 

O 

1. Introduction. In this paper we consider the analysis of longitudinal 
data that arise in the form of a small number of long sequences. Our inter- 
est in this problem is motivated by the Naturalistic Teenage Driving Study 
(NTDS), an observational study of teenage driving performance and char- 
acteristics [Simons-Morton et al. (2011a, 2011b)]. In the study, 42 newly li- 
censed teenage drivers in Virginia were monitored continuously during their 
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Fig. 1. Summary of NTDS data by subject: the num,ber of trips made and the total 
number of miles driven, both plotted on the logarithmic scale. 



first 18 months of independent driving using in-veliicle data recording sys- 
tems. The instrumentation included accelerometers, video cameras, a global 
positioning system, a front radar and a lane tracker. The NTDS is a first of 
its kind, at least for teenage drivers in the United States. The study pro- 
vides valuable information on risky driving behavior, which can be assessed 
in terms of elevated gravitational force ((7- force) events (rapid start, hard 
stop, hard turn and yaw). Counts of g-force events are available for each trip 
(defined as ignition on to ignition off), and their incidence rates represent 
different aspects of risky driving behavior. The NTDS data set comprises 
more than 68,000 trips by the 42 teen subjects, with an average of 1,626 trips 
per subject. Figure 1 provides a summary of the NTDS data by subject in 
terms of the number of trips made and the total number of miles driven. An 
important goal in our analysis of the NTDS data is to understand how risky 
driving is associated with subject-level covariates (i.e., individual character- 
istics such as gender) as well as trip- level or time-dependent covariates (e.g., 
time since licensure, presence of passengers). 

There is an extensive literature on longitudinal data analysis; see, for ex- 
ample, Diggle et al. (2002), Fitzmaurice et al. (2008) and McCulloch, Searle 
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and Neuhaus (2008). Common approaches to longitudinal data analysis in- 
clude random effect models [Laird and Ware (1982), McCulloch, Searle and 
Neuhaus (2008)] and generalized estimating equations (GEE) [Liang and 
Zeger (1986), Zeger and Liang (1986), Zeger, Liang and Albert (1988)]. In 
general, the two approaches have different interpretations (subject-specific 
versus marginal), although that distinction is not important when the log 
link is used for count data. For analyzing the NTDS data, it is natural to 
consider generalized linear mixed models (GLMM) with appropriate random 
effects to account for population heterogeneity, serial correlation and/or pos- 
sible overdispersion [Chan and Ledolter (1995), McCulloch (1997), McCul- 
loch, Searle and Neuhaus (2008)]. However, a realistic GLMM for the NTDS 
data would involve several random components, including a latent process 
that induces serial correlation (see Section 5), and the computational de- 
mand of such a GLMM analysis can be prohibitive with thousands of trips 
per subject. Even if the computation is feasible, the resulting inference may 
be sensitive to modeling assumptions that are difficult to verify. The compu- 
tational burden can be reduced by resorting to a marginal analysis using the 
GEE approach, especially under a working independence assumption. The 
GEE approach only requires specification of the first two moments and not 
the entire distribution. The robust variance estimate can be used to make 
asymptotically valid inference that only requires correct specification of the 
mean structure, assuming that the number of subjects is large relative to 
the number of observations per subject. The exact opposite situation occurs 
in the NTDS, and Albert and McShane (1995) have shown that the robust 
variance estimate can perform poorly in such a situation and that the model- 
based variance estimate may be preferable. Numerous methods have been 
proposed to improve upon the robust variance estimate, including jackknife 
[Paik (1988), Lipsitz, Laird and Harrington (1990)], bias correction [Mancl 
and DeRouen (2001)] and window subsampling techniques [Sherman (1996), 
Heagerty and Lumley (2000)]. 

Given the well-known advantages and potential issues of the GEE ap- 
proach, it is natural to ask how to perform a simple and valid marginal 
analysis of the NTDS data with reasonable efficiency and robustness. We 
attempt to address this question in the present paper by examining some 
existing methods and developing new ones. We find that it is generally help- 
ful to include a fixed effect for each subject in the GEE model. Once the 
subject effects are estimated, they can be treated as the response variable in 
a subsequent linear regression analysis for estimating the effects of subject- 
level covariates. These fixed effects also help with trip- level covariates by 
removing the correlation due to population heterogeneity. If the counts are 
large (with a marginal mean of 1, say), the effects of trip-level covariates 
can be estimated from a conventional GEE analysis using an estimated co- 
variance matrix together with the robust variance estimate. For small counts 
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(with a marginal mean of 0.1), the conventional approach is not satisfactory, 
and we explore a within-cluster resampling (WCR) approach [Hoffman, Sen 
and Weinberg (2001), Follmann, Proschan and Leifer (2003)]. The WCR ap- 
proach was originally proposed to deal with informative cluster sizes, which 
is not of concern in the NTDS. Our motivation for considering WCR is to 
improve the performance of GEE methods by altering the data structure. 
To this end, we consider extensions of WCR that reduce serial correlation 
within clusters or increase the number of (approximately) independent clus- 
ters. This leads to a WCR method involving separated blocks which performs 
better than the conventional methods. 

The rest of the paper is organized as follows. In the next section we set up 
the notation, formulate the problem, and discuss potential issues. Then we 
examine the standard GEE methods in Section 3 and explore some WCR 
methods in Section 4. The NTDS data are analyzed in Section 5 using the 
appropriate methods. The paper concludes with a discussion in Section 6. 

2. Formulation. Let Yij (i = 1, . . . ,n; j = 1, . . . ,ki) denote the number 
of events that occur during the j'th trip by the ith subject. For reasons 
that will become clear later, we distinguish subject-level covariates such as 
gender from trip-level covariates such as time since licensure, writing Zi for 
the former group of covariates and Xij for the latter. Note that Xij may 
include interactions between subject-level and trip-level covariates. With rriij 
denoting the mileage of the jth trip by the ith subject, the marginal model 
of interest to us may be written as 

(1) E(y,j) = ruij exp(z/ + a'Z, + /3'Xy), 

where i/, a and (3 are unknown parameters, the latter two being of primary 
importance. Without loss of generality, we treat the covariates as fixed. 

Estimation of the parameters in model (1) is challenged by the fact that 
the Yij from the same subject tend to be correlated due to considerable 
variability between drivers as well as serial correlation (over time) within 
drivers. These sources of correlation are often accounted for using random 
effects [e.g., Zeger (1988), Davis, Dunsmuir and Wang (2000)]. For example, 
one might postulate that, conditional on the random effects bi, Cij and e^j, 
the Yij are independent and each follows a Poisson distribution with condi- 
tional mean 

(2) E{Yij\bi,Cij,eij) = ruij exp{v* + a Zi + fi' Xij -F 6j + Qj + ejj), 

where hi induces some heterogeneity between drivers (beyond that explained 
by Zj), Cij generates serial correlation among trips close in time, and e^j 
accounts for any additional overdispersion relative to the Poisson model. It 
is often assumed that the Qj for a given subject arise from a subject-specific 
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stochastic process Cj through the relationship Cij = Ci{tij), where tij denotes 
the time of the jth trip. Note that model (2) is consistent with model (1) 
because it implies the same mean structure after integrating out the random 
effects, provided the distributions of the random effects do not depend on 
the covariates. 

For the NTDS data, it seems reasonable to equip model (2) with the 
following distributional assumptions. One might assume that bi ~ A^(0,cr^), 
Cij ~ A^(0,(Tg), and q is a zero-mean Gaussian process with a covariance 
structure given by 

(3) cov{ci(ii),Ci(i2)} = o-^exp(-7|ii -^21)- 

The random effects bi and e^j and the process Cj are assumed independent 
of each other, although the Cij = Ci{tij) are necessarily correlated within 
each subject. The parameter 7 > determines how rapidly the serial corre- 
lation decreases with the gap time. The process Cj is known as the Ornstein- 
Uhlenbeck process [Uhlenbeck and Ornstein (1930)], and the above distri- 
butional assumptions will be collectively referred to as the Gauss-Ornstein- 
Uhlenbeck-Poisson (GOUP) model. It is possible to perform a maximum 
likehhood analysis under the GOUP model [e.g., Chan and Ledolter (1995)]. 
However, such an analysis can be computationally demanding because of 
the serial correlation, and the resulting inference may be sensitive to the 
distributional assumptions involved. The methods to be considered for our 
marginal analysis will not require the full strength of the GOUP model, and 
they may or may not involve the distributional assumptions in the GOUP 
model. Some of the methods we consider do incorporate such distributional 
assumptions into estimating equations through a working covariance matrix, 
in which case we also assess the robustness of the resulting inference against 
misspecification of the correlation structure. Our goal is to make valid and 
reasonably efficient inference on a and /3 in model (1) with minimal depen- 
dence on the distributional assumptions in the GOUP model. 

Without the distributional assumptions, model (2) does play a crucial role 
in this paper, as a way to conceptualize the different sources of variability. We 
assume that the different subjects are independent of each other and that the 
random effects bi and (ejj) -^^^ and the random process Cj are independent of 

each other within each subject. It follows that the Yij (j = 1, . . . , ki) are con- 
ditionally independent given bi and q . Another important implication is that 

(4) E{Yij\bi) = ruij exp{ui + P'Xij), 
where 

(5) f j = I/* + a'Zi + bi + log[E{exp(cjj + eij)}] = const. + a'Zi + bi. 

This shows that the correlation due to bi can be removed by treating sub- 
ject as a fixed effect. Obviously, this approach would not work in the usual 
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asymptotic theory assuming a large number of subjects and a limited num- 
ber of observations per subject. However, it might be appropriate when the 
number of subjects is small and the number of observations per subject is 
large as in the NTDS. Note that the subject-specific model (4) does not 
directly involve a, which has been absorbed into the subject-specific inter- 
cept Vi; thus, a cannot be estimated directly by fitting model (4). Nonethe- 
less, equation (5) suggests that once Ui has been estimated, say, by z?j, it 
should be possible to estimate a from a subject-level linear model regress- 
ing Di on Zi. 

3. Standard GEE methods. 

3.1. GEE with working independence. Assuming working independence, 
a standard GEE analysis can be performed with or without fixed subject 
effects (FSE), using either the robust variance estimate or the model-based 
variance estimate. This gives rise to four possible methods for estimating /3. 
As discussed earlier, a is not directly estimable from a GEE analysis with 
FSE but can be recovered from a subsequent linear regression analysis based 
on (5). Let Pj denote the GEE estimate of fj, and assume that Pj is approxi- 
mately unbiased, that is, E(9j|fj) ~ mj. This, together with (5), implies that 

E{V,) = E{E{Vi\u,)} ^ E{ui) = const. + a'Zi. 

An approximately unbiased estimate of a can then be obtained from a sub- 
ject-level linear model regressing 9j on Zj. A standard least squares (LS) 
algorithm can be used to fit this model if the Pj can be treated as inde- 
pendent. In general, the Vi are correlated due to correlated errors in GEE 
estimation, even though the Vi are indeed independent of each other. This 
correlation can be taken into account using an iteratively reweighted least 
squares (IRLS) algorithm described in Appendix A. Thus, under working 
independence, a can also be estimated using four different methods (LS and 
IRLS with FSE, robust and model-based without FSE). 

These methods are evaluated and compared in a simulation study mim- 
icking the NTDS. Specifically, each simulated data set consists of 40 sub- 
jects and 60,000 trips (1500 per subject). The study duration is rescaled to 
the unit interval, over which the trips are uniformly distributed. The off- 
set log(mjj) follows a normal distribution with mean 1 and variance 1, Zi 
is generated as a Bernoulli variable with success probability 0.5, and Xij 
is taken to be the trip time (i.e., Xij = tij). Given the covariates, the out- 
come Yij is generated according to the GOUP model described in Section 2, 
with cr^ = o"^ = (Tg = 1 and 7 = 50 or 300, corresponding to longer- and 
shorter-lived serial correlation, respectively. These values are based on the 
NTDS data (see Section 5) and the wide range for 7 reflects a large amount 
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Table 1 

Estimation of a, the effect of a subject-level covariate, using GEE methods with working 

independence. The methods are described in Section 3.1 and Appendix A, and compared 

in terms of empirical bias, standard deviation (SD), median standard error (SE) and 

coverage probability (CP) of intended 95% confidence intervals. Each entry is based on 

1,000 replicates 



Mean 


Serial 


Include 


Further 


Emp. 


Emp. 


Median 


Emp. 


count 


correlation 


FSE? 


option 


bias 


SD 


SE 


CP 


1 


Short 


No 


Robust 


0.00 


0.39 


0.33 


0.90 






No 


Model-based 


0.00 


0.40 


0.03 


0.14 






Yes 


LS 


0.00 


0.33 


0.32 


0.95 






Yes 


IRLS 


0.01 


0.32 


0.31 


0.95 




Long 


No 


Robust 


0.00 


0.42 


0.34 


0.90 






No 


Model-based 


-0.01 


0.42 


0.03 


0.11 






Yes 


LS 


-0.01 


0.33 


0.33 


0.94 






Yes 


IRLS 


0.00 


0.32 


0.32 


0.95 


0.1 


Short 


No 


Robust 


0.02 


0.41 


0.33 


0.90 






No 


Model-based 


0.01 


0.40 


0.04 


0.17 






Yes 


LS 


-0.01 


0.35 


0.33 


0.94 






Yes 


IRLS 


-0.01 


0.36 


0.32 


0.93 




Long 


No 


Robust 


-0.01 


0.42 


0.34 


0.89 






No 


Model-based 


0.00 


0.40 


0.04 


0.15 






Yes 


LS 


0.01 


0.35 


0.33 


0.94 






Yes 


IRLS 


0.00 


0.33 


0.32 


0.95 



of variability in its estimation (see Section 3.2). We set q = /3 = for sim- 
plicity because the results change little over a range of realistic values for 
these parameters. We choose v* in (2) such that the marginal mean of Yij 
equals a specified value (0.1 or 1). This range for E(lij) covers most kine- 
matic measures in the NTDS with varying thresholds. Some measures are 
associated with larger counts, which are generally easier to deal with. In 
each scenario (combination of parameter values), 1,000 replicate samples 
are generated and analyzed using the methods described in the preceding 
paragraph. 

Table 1 compares the four methods for estimating a in terms of empirical 
bias, standard deviation, median standard error and coverage probability of 
intended 95% confidence intervals. All methods in Table 1 are nearly unbi- 
ased, suggesting that bias is not of concern here. In terms of efficiency, the 
most important factor is clearly the use of FSE, which consistently results in 
smaller standard deviations. With FSE in the model, one might expect the 
IRLS estimate of a, which accounts for the correlation among the z?j, to be 
more efficient than the LS estimate. However, their difference seems rather 
small in Table 1, for two reasons. First, the variability due to estimating 
the Vi is dominated by the variability in the Vi, resulting in weak correla- 
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tion among the Pj in this particular example. Second, a referee pointed out 
that any correlation that does exist among the Di should be approximately 
exchangeable, in which case the LS estimate is still efficient. The precision 
in estimating a appears insensitive to the length of the serial correlation 
(specified through 7), although the FSE estimates do seem to become more 
variable for low counts [E(l^j) = 0.1]. The use of FSE also helps with vari- 
ance estimation. Without FSE, the model-based variance estimate is clearly 
disastrous, as expected, and even the robust variance estimate is unsatisfac- 
tory, with sub-nominal coverage (~90%). Including FSE in the GEE model 
leads to reasonable variance estimates and nearly correct coverage, under 
both (LS and IRLS) approaches. Thus, it seems that the key to valid and 
efficient inference about a in similar situations is to include FSE in a GEE 
analysis followed by a subject-level linear regression analysis for the esti- 
mated FSE. 

Estimation of /3 is a different story, as shown in the top section of Table 2. 
As in the case of estimating a, bias is not a major issue in estimating f3. The 
precision in estimating /3 depends mostly on the length of the serial corre- 
lation, with better precision for shorter-lived serial correlation, and seems 
insensitive to other factors (e.g., FSE, mean count). This is clearly different 
from the situation in Table 1, and an intuitive explanation is the following. 
In general, estimating the effect of a subject-level covariate is essentially 
comparing one group of subjects with another, while estimating the effect 
of a trip-level covariate is essentially comparing one group of trips with an- 
other group of trips by the same subjects. With Xij = tij, estimation of (3 
is basically a comparison of later trips with earlier trips, and it seems nat- 
ural that serial correlation has a larger impact on this comparison than on 
a comparison of boys with girls, say. The latter comparison, on the other 
hand, is more likely to benefit from the use of FSE to separate the relevant 
information (i.e., overall incidence rates of individual drivers) from the noise 
(i.e., within-subject variation). In Table 2, none of the four methods is satis- 
factory in terms of coverage, though the robust variance estimate performs 
better than the model-based one. Several alternatives to the robust variance 
estimate have also been explored with little success (see Section 6). 

3.2. GEE based on an estimated covariance matrix. We now consider 
GEE methods incorporating the covariance matrix for the Yij. Under the 
GOUP model described in Section 2, it is straightforward to show, by ap- 
pealing to well-known properties of normal and Poisson distributions, that 

E{Yij) = m,j expji^* + olZi + Xi^ + [al + al + f7^)/2} =: ^x^^-, 

(6) var(yij) = inj + /i^_,{exp(o-^ + c^c + o-g) - 1}> 

coviJij.Yiji) = fiijfiij' [exp{af + a^ exp(-7|% - %/ 1)} - 1] , 
(7) 

(Jt^/)- 
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Table 2 

Estimation of /3, the effect of a trip-level covariate, using standard GEE methods. The 

methods are described in Sections 3.1-3.3 and compared in terms of empirical bias, 

standard deviation (SD), median standard error (SE), coverage probability (CP) of 

intended 95% confidence intervals and the percentage of estimates that are not available 

(NA) due to numerical problems such as noninvertible matrices. In each scenario, 1,000 

replicates are generated, and the NA estimates are counted and then excluded in 

calculating the other summary statistics 



Mean Serial Include Varicince Emp. Emp. Median Emp. 

count correlation FSE? estimate bias SD SE CP 



%NA 



Working independence 



1 


Short 


No 


Robust 


0.00 


0.12 


0.09 


0.91 


0.0 






No 


Model-based 


0.00 


0.12 


0.05 


0.67 


0.0 






Yes 


Robust 


-0.01 


0.12 


0.09 


0.90 


0.0 






Yes 


Model-based 


0.00 


0.12 


0.04 


0.47 


0.0 




Long 


No 


Robust 


0.01 


0.21 


0.16 


0.92 


0.0 






No 


Model-based 


-0.01 


0.21 


0.05 


0.41 


0.0 






Yes 


Robust 


-0.01 


0.21 


0.16 


0.91 


0.0 






Yes 


Model-based 


-0.01 


0.21 


0.04 


0.28 


0.0 


0.1 


Short 


No 


Robust 


0.00 


0.13 


0.10 


0.90 


0.0 






No 


Model-based 


-0.01 


0.13 


0.07 


0.74 


0.0 






Yes 


Robust 


0.00 


0.13 


0.10 


0.90 


0.0 






Yes 


Model-based 


0.00 


0.13 


0.06 


0.65 


0.0 




Long 


No 


Robust 


0.02 


0.21 


0.17 


0.90 


0.0 






No 


Model-based 


0.00 


0.21 


0.07 


0.49 


0.0 






Yes 


Robust 


-0.01 


0.21 


0.17 


0.91 


0.0 






Yes 


Model-based 


-0.01 


0.21 


0.06 


0.42 


0.0 








Estimated covariance matrix 








1 


Short 


Yes 


Robust 


0.00 


0.07 


0.07 


0.95 


0.0 






Yes 


Model-based 


-0.01 


0.07 


0.07 


0.95 


0.0 




Long 


Yes 


Robust 


-0.01 


0.12 


0.12 


0.94 


0.0 






Yes 


Model-based 


0.00 


0.12 


0.11 


0.90 


0.1 


0.1 


Short 


Yes 


Robust 


0.00 


0.09 


0.10 


0.92 


4.6 






Yes 


Model-based 


0.00 


0.10 


0.10 


0.95 


4.9 




Long 


Yes 


Robust 


0.00 


0.15 


0.16 


0.90 


3.5 






Yes 


Model-based 


-0.01 


0.15 


0.12 


0.87 


3.9 








Misspecified covariance matrix 








1 


Varying 


Yes 


Robust 


-0.02 


0.09 


0.09 


0.93 


0.0 






Yes 


Model-based 


-0.03 


0.09 


0.08 


0.92 


0.0 


0.1 


Varying 


Yes 


Robust 


-0.02 


0.11 


0.12 


0.90 


3.7 






Yes 


Model-based 


-0.02 


0.11 


0.11 


0.93 


2.9 
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These expressions provide the marginal covariance matrix relevant in a GEE 
analysis without FSE. With FSE in the model, we need to condition on bi 
and the relevant formulas become 

E{Yij \bi) = ruij exp(f j + /3'Xij) =: fiij^^, , 

(8) var{Yij \bi) = fi.j^i,^ + ^,J^.|^^{exp(o-^ + a^) - 1}, 

cov {Yij,Yij>\bi) = /_fij|b^/_fij/[6.[exp{o-c exp(-7|iij - tij>\)} - 1], 
(9) 

with i^i defined in Section 2. Note that u^ is not involved in the covariance 
matrix in a GEE analysis with FSE. Unknown parameters in the covariance 
matrix (o"^, Ug, 7 and possibly a^) can be estimated by applying moment 
methods and nonlinear regression techniques to residuals from a preliminary 
GEE analysis with working independence (see Appendix B for details). This 
preliminary GEE analysis can be performed with or without FSE, regardless 
of the primary GEE analysis for estimating /?. With FSE in the preliminary 
GEE analysis, the aforementioned techniques do not provide an estimate 
of cj^. If desired, an estimate of o"^ can be obtained from the IRLS estimation 
of a or simply as the error variance in the LS analysis, ignoring the fact 
that Vi is an error-prone estimate of i^i (see Section 3.1 and Appendix A for 
details). Thus, the covariance matrix can be estimated using three methods: 
FSE-LS, FSE-IRLS and no FSE, with the first two differing only in the 
estimate of a^ . 

Table 3 shows the performance of the above three methods for estimating 
parameters in the covariance structure, in terms of bias, standard devia- 
tion and convergence. The methods are evaluated in the same scenarios as 
the previous simulation experiments, with the addition of a higher mean 
count (10) to show a more complete picture. For a mean count of 10, the 
FSE methods are nearly unbiased for the variance components but biased 
for 7 in a way that results in underestimation of the length of the serial 
correlation. The bias for 7 is larger for longer-lived serial correlation. The 
no FSE method is more biased for 7 (in the same direction) and even visi- 
bly biased for o"^. With decreasing counts, the bias problem becomes more 
severe, especially for 7, to the extent that estimation of the serial correla- 
tion is meaningless for a mean count of 0.1. The problem is compounded 
by a large amount of variability in estimating 7. Another issue with low 
counts is convergence in the nonlinear regression. For a mean count of 0.1, 
the convergence failure rates are approximately 8% and 13% (with short- 
and long-lived serial correlation, resp.) for the FSE methods and even worse 
for the no FSE method. The FSE methods appear more reasonable than the 
no FSE method, and we will use FSE-LS, which is simpler than FSE-IRLS, 
in the subsequent simulations. 
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Table 3 

Estimation of parameters in the covariance matrix {al,a^,a'^,'y) using moment and 

nonlinear regression methods. The methods are described in Appendix B and compared in 

terms of empirical bias, standard deviation (SD) and the percentage of estimates that are 

not available (NA) due to convergence failure in nonlinear regression. In each scenario, 

1,000 replicates are generated, and the NA estimates are counted and then excluded in 

calculating the other summary statistics 



Serial 








Emp. bias 




Emp. SD 




correlation 


Method 


crl 


ctI 


o-e 7 


crl 


'yl 


<rl 


7 


%NA 










Mean count = 10 












Short 




FSE-LS 


-0.01 


-0.03 


-0.02 l.lE-(-01 


0.23 


0.10 


0.11 


1.4E+02 


0.1 


(7 = 


:300) 


FSE-IRLS 


-0.02 


-0.03 


-0.02 l.lE-fOl 


0.23 


0.10 


0.11 


1.4E+02 


0.1 






NoFSE 


-0.26 


0.01 


-0.06 4.1E-I-02 


0.25 


0.29 


0.22 


5.9E+03 


6.8 


Long 




FSE-LS 


0.03 


-0.10 


-0.02 2.0E4-01 


0.24 


0.07 


0.08 


4.9E+01 


0.2 


(7 = 


= 50) 


FSE-IRLS 


0.02 


-0.10 


-0.02 2.0E+01 


0.24 


0.07 


0.08 


4.9E+01 


0.2 






NoFSE 


-0.30 


0.01 


-0.06 1.8E+02 


0.26 


0.40 


0.17 


2.1E+03 


5.8 










Mean count — 1 












Short 




FSE-LS 


-0.01 


-0.03 


0.00 1.2E-f02 


0.23 


0.21 


0.23 


2.9E-h03 


0.3 


(7 = 


= 300) 


FSE-IRLS 


-0.02 


-0.03 


0.00 1.2E+02 


0.23 


0.21 


0.23 


2.9E4-03 


0.3 






NoFSE 


-0.29 


0.01 


-0.06 2.6E+02 


0.23 


0.24 


0.21 


4.5E+03 


8.7 


Long 




FSE-LS 


0.03 


-0.10 


-0.02 3.4E-H01 


0.25 


0.10 


0.21 


8.5E+01 


0.4 


(7 = 


= 50) 


FSE-IRLS 


0.01 


-0.10 


-0.02 3.4E-h01 


0.24 


0.10 


0.21 


8.5E+01 


0.4 






NoFSE 


-0.30 


0.05 


-0.08 2.1E-f02 


0.28 


0.97 


0.20 


1.5E+03 


17.5 










Mean count = 0.1 












Short 




FSE-LS 


0.28 


-0.03 


-0.13 6.3E+02 


2.97 


0.51 


1.18 


4.1E+03 


7.9 


(7 = 


= 300) 


FSE-IRLS 


0.25 


-0.03 


-0.13 6.3E+02 


2.97 


0.51 


1.18 


4.1E+03 


7.9 






NoFSE 


-0.28 


0.08 


-0.12 6.5E+02 


0.35 


0.39 


0.39 


3.6E+03 


30.4 


Long 




FSE-LS 


0.14 


-0.04 


-0.14 1.0E-f03 


1.65 


0.51 


1.17 


1.4E-f04 


12.9 


(7 = 


= 50) 


FSE-IRLS 


0.08 


-0.04 


-0.14 1.0E-h03 


1.24 


0.51 


1.17 


1.4E-h04 


12.9 






NoFSE 


-0.34 


0.14 


-0.14 2.2E-F03 


0.33 


0.58 


0.35 


3.3E-F04 


37.3 



GEE methods based on covariance matrices estimated through FSE-LS 
are evaluated in the same setting described in Section 3.1. When the nonlin- 
ear regression in FSE-LS fails to converge, the data-driven initial values will 
be taken as the final estimates [see equation (15) in Appendix B]. With thou- 
sands of trips per subject, inverting a covariance matrix is time-consuming, 
and iterating until convergence in the usual Fisher scoring algorithm [Fitz- 
maurice et al. (2008), Section 3.2.4] would be prohibitive. We therefore settle 
for a one-step estimate of the regression coefficients based on just one itera- 
tion in the Fisher scoring algorithm. Specifically, we start with a GEE anal- 
ysis with working independence, use the residuals to estimate parameters in 
the covariance matrix, and update the regression coefficients just once us- 
ing the estimated covariance matrix. Under this approach, it is necessary to 
include FSE in each GEE analysis; otherwise meaningful simulation results 
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cannot be produced. This is not surprising because inversion of large covari- 
ance matrices can be difficult when the correlation is strong (without FSE) 
and poorly estimated. This is not a serious limitation because the use of 
FSE leads to better efficiency and coverage anyway, which we have observed 
in simulations where the true covariance matrix is used in GEE (results 
not shown). Once FSE are included, numerical problems become much less 
frequent (3~5% for a mean count of 0.1, <1% for higher counts). 

We have found that incorporating the covariance matrix into the estimat- 
ing equation does little to improve the estimation of a in terms of efficiency 
and coverage. We therefore omit the results for estimating a and henceforth 
focus on estimating /3. The results for the latter, reported in the middle 
section of Table 2, show that better efficiency is attained using an estimated 
covariance matrix than under working independence. In terms of coverage, 
the robust variance estimate performs well for higher counts but not for 
lower ones, especially when the serial correlation is long-lived. The model- 
based variance estimate performs well for short-lived serial correlation and 
poorly for long-lived serial correlation. The former case shows that underes- 
timating the length of the serial correlation has little effect when the serial 
correlation is already negligibly short. Although not shown in Table 2, the 
model-based variance estimate has been found to work well in all cases if 
the true covariance matrix is used. Thus, the root of the problem appears 
to be the difficulty in estimating 7. 

3.3. GEE based on a misspecified covariance matrix. It is natural to ask 
how the GEE methods based on the GOUP model would perform if the 
model is misspecified. For the NTDS, the GOUP model seems plausible for 
the most part. It is possible, however, that the length of the serial correlation, 
which is assumed constant, might change over time. To formalize this notion 
of changing length, note that the covariance structure given by (3) can be 
generalized into 

(10) cov{ci(ti),Ci(t2)} = o-2expf- / -f{t)dt], ti<t2, 

for a function 7 : [0, 1] — >■ (0, 00). For a constant 7, this reduces to the original 
GOUP model. In the next set of simulation experiments, we will generate 
data using the generalized GOUP model with 7 taken to be a straight line 
from 7(0) = 300 to 7(1) = 50. Thus, instead of working exclusively with 
short- or long-lived serial correlation, we allow it to be short-lived at the 
beginning and to gradually become long-lived at the end. This is motivated 
by the conjecture that teenage driving will tend to be haphazard at the 
beginning and will become more consistent over time. Unfortunately, it ap- 
pears difficult to verify this conjecture directly using the data. Indeed, even 
when 7 is a constant, we have seen in Table 3 that it can be really difficult to 
estimate, especially when the counts are low. It seems reasonable to expect 
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that a time- varying 7 will be even more difficult to estimate. We therefore 
take a sensitivity analysis approach to this issue. The bottom section of Ta- 
ble 2 shows the performance in the present setting of GEE methods based 
on the original GOUP model. The point estimates are more (or less) effi- 
cient in this setting than if the serial correlation is consistently long-lived (or 
short-lived) . The coverage property of the model-based variance estimate is 
also intermediate between the two extremes. As before, the robust variance 
estimate works well for large counts but not for small ones. 

3.4. Summary. To summarize, valid inference on a can be made through 
a GEE analysis with FSE (regardless of the covariance matrix) followed by 
a linear regression analysis, while inference on /3 requires more work. For 
large counts, valid inference on f3 can be made by estimating the covari- 
ance matrix and using the robust variance estimate; in this case, the infer- 
ence seems robust with respect to the serial correlation structure. For small 
counts, the robust variance estimate is unsatisfactory and the model-based 
variance estimate performs well only for short-lived serial correlation. Be- 
cause the serial correlation is difficult to estimate with small counts, one 
would not know in practice whether the model-based variance estimate is 
appropriate for a particular application. Therefore, it is necessary to de- 
velop methods that perform well (or better) for small counts with short- 
and long-lived serial correlation. This is the focus of the next section, where 
we explore WCR-type methods. 

4. WCR and extensions. 

4.1. The standard WCR approach. WCR is a resampling approach for 
marginal analysis of clustered data, originally designed to deal with informa- 
tive cluster sizes [Hoffman, Sen and Weinberg (2001), Follmann, Proschan 
and Leifer (2003)]. In its original form, the approach can be applied to the 
present problem as follows. Suppose we sample a trip from each subject 
randomly and form a subset of nonclustered data 

(11) {{Zu„Xij^,Yij,),i = l,...,n}, 

where Jj is uniformly distributed on {1, . . . ,ki}, independently across i. 
There is no correlation in the subsample (11), which can therefore be ana- 
lyzed using a quasi-Poisson model with mean structure given by (1). This can 

be repeated a large number (say, L) of times to yield {{61, 'Si), I = I, ... ,L}, 
where 9i denotes the point estimate 0I6 = (a, /3) based on the Ith subsample, 
and S/ is the associated variance estimate. Then the WCR estimate of 9 is 
given by 



1 ^ ^ 

(12) ^WCR = Y/ ,^l 



L 

1=1 
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and the associated variance estimate is 

(13) SwcR = T 2^ ^' ~ T _ 1 / A^i ~ Gwcr){(^i - ^wcr)'- 

1=1 1=1 

The first term on the right-hand side of (13) measures the variability within 
subsamples, while the second term measures the variability between sub- 
samples. 

In general, for any WCR method to work, it is essential that valid estima- 
tes {9i, S/) be obtained from each subsample. Note that the estimates (0/, S/) 
are identically distributed. If 9i is biased for 9, then ^WCR given by (12) is 
also biased by the same amount. Similarly, if S^ underestimates the sampling 
variance of 9i, then SwcR given by (13) will underestimate the sampling 
variance of 0wcR by a similar amount, because the second term on the right- 
hand side of (13) will be approximately unbiased for large L. Furthermore, 
in terms of relative bias, the underestimation problem of 5]wcR is generally 
worse than that of S/, simply because the true variance of ^WCR is smaller 
than that of 9i. In light of these observations, we will study WCR methods in 
two steps, always considering a single subsample before moving to repeated 
sampling. 

4.2. WCR with simple random sam.pling (WCR-SRS). There is no rea- 
son why a WCR subsample must consist of exactly one observation from 
each subject. In fact, to fit a model with FSE, there has to be more than one 
observation from each subject. Without FSE, the standard WCR approach 
is feasible but, as we shall see, does not work well. We therefore consider 
an extended WCR approach where a simple random sample (SRS) is taken 
from each subject to form an WCR subsample. Thus, instead of a single 
number Jj, we take an SRS {Jj , ..., Jj } from the index set {l,...,/ci} 
for each i. The resulting subsample is 

{(^ijM , X.j(r) , y^j(r)),i = 1, . . . , n, r = 1, . . . , i?}, 

for which a standard GEE analysis can be performed, under working inde- 
pendence or using an estimated covariance matrix, with or without FSE, 
using the robust or model-based variance estimate. This can be repeated 
a large number of times, and the resulting estimates can be combined us- 
ing (12) and (13). If the covariance matrix for the Yij is used, it will be 
estimated once from the full sample and the estimate will be applied to all 
subsamples. 

This WCR-SRS approach is evaluated in Table 4. Given the findings in 
the last section, this evaluation is focused on small counts. Under working 
independence, only the robust variance estimate is considered because the 
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Table 4 

Estimation of j3, the effect of a trip-level covariate, using the WCR-SRS approach with 

different values of R (number of trips to sample from each subject) and L (number of 

repetitions in WCR). The methods are described in Sections 4-1 o-nd 4.2 and compared in 

terms of empirical bias, standard deviation (SD), median standard error (SE), coverage 

probability (CP) of intended 95% confidence intervals and the percentage of estimates 

that are not available (NA). The mean count is fixed at 0.1. In each scenario, 1,000 

'licates are generated, and the NA estimates are counted and then excluded in 

calculating the other summary statistics 



Serial Include Variance 

correlation FSE? estimate 



R 



Emp. 
bias 



Emp. 
SD 



Median Emp. 
SE CP 



%NA 



Short 



No 



Yes 



Long 



No 



Yes 



Robust 



Robust 



Robust 



Robust 



Working independence 



1 




-15.40 


357.22 


1.59 


0.64 


0.4 


5 




-0.02 


1.44 


0.99 


0.86 


0.0 


25 




0.03 


0.64 


0.52 


0.91 


0.0 


100 




-0.02 


0.37 


0.28 


0.91 


0.0 


500 




-0.01 


0.20 


0.15 


0.92 


0.0 


100 


500 


0.00 


0.13 


0.00 


0.16 


0.0 


5 




0.09 


1.59 


1.16 


0.87 


0.2 


25 




0.01 


0.68 


0.50 


0.87 


0.1 


100 




-0.02 


0.37 


0.28 


0.90 


0.1 


500 




0.00 


0.18 


0.14 


0.91 


0.0 


100 


500 


0.00 


0.12 


0.00 


0.18 


0.0 


1 




32.12 


2011.24 


1.64 


0.64 


0.8 


5 




-0.08 


1.43 


1.00 


0.89 


0.0 


25 




-0.01 


0.72 


0.53 


0.89 


0.0 


100 




0.01 


0.42 


0.31 


0.92 


0.0 


500 




0.00 


0.24 


0.20 


0.91 


0.0 


100 


500 


-0.01 


0.21 


0.10 


0.52 


0.0 


5 




-0.62 


19.13 


1.13 


0.85 


0.1 


25 




-0.02 


0.71 


0.52 


0.89 


0.1 


100 




0.00 


0.39 


0.30 


0.90 


0.0 


500 




-0.01 


0.24 


0.20 


0.90 


0.0 


100 


500 


-0.01 


0.21 


0.10 


0.56 


0.0 



model-based variance estimate is clearly invalid. As before, no meaningful 
results are available when an estimated covariance matrix is used without 
FSE. For each method, we begin by analyzing a single subsample, with L = 1 
and different values of R (size of the SRS from each subject). Table 4 shows 
poor performance for R=l (i.e., standard WCR), apparently due to er- 
ratic estimates, which are likely to arise when the subsample is very small. 
Even for R = 5, the point estimate is visibly biased in some cases, proba- 
bly because the subsample is still small. The coverage probability generally 
increases with R, at least for R < 100. The best coverage is seen when an 
estimated covariance matrix is used together with FSE and the model-based 
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Table 4 
(Continued) 



Serial 


Include 


Variance 






Emp. 


Emp. 


Median 


Emp. 




correlation 


FSE? 


estimate 


R 


L 


bias 


SD 


SE 


CP 


%NA 






Estimated covariance matrix 








Short 


Yes 


Robust 


5 




0.03 


1.44 


0.89 


0.75 


0.4 








25 




0.02 


0.59 


0.48 


0.85 


1.4 








100 




0.01 


0.30 


0.29 


0.91 


3.3 








500 




-0.01 


0.15 


0.15 


0.92 


4.6 








100 


500 


0.00 


0.09 


0.21 


0.70 


2.6 






Model-based 


5 




-0.09 


1.49 


1.19 


0.91 


0.2 








25 




0.02 


0.58 


0.55 


0.94 


1.1 








100 




0.00 


0.29 


0.30 


0.94 


2.2 








500 




0.00 


0.15 


0.15 


0.97 


4.1 








100 


500 


0.01 


0.10 


0.12 


0.68 


1.4 


Long 


Yes 


Robust 


5 




-0.04 


1.48 


0.93 


0.76 


0.4 








25 




0.00 


0.61 


0.51 


0.87 


1.2 








100 




0.00 


0.31 


0.30 


0.90 


2.3 








500 




-0.01 


0.18 


0.19 


0.90 


3.1 








100 


500 


0.00 


0.15 


0.24 


0.69 


2.8 






Model-based 


5 




-0.02 


1.67 


1.22 


0.92 


0.4 








25 




0.01 


0.59 


0.54 


0.93 


1.9 








100 




-0.01 


0.32 


0.31 


0.94 


2.4 








500 




0.00 


0.18 


0.17 


0.92 


3.2 








100 


500 


0.00 


0.16 


0.16 


0.72 


2.6 



variance estimate. As explained in Section 3.2, this method tends to work 
well when the serial correlation is weak, which is precisely what happens to 
a randomly selected subset of trips, which tend to be farther apart from each 
other than in the full sample. This explains why better coverage is seen here 
than in the corresponding portion of Table 2. Of course, this trend will be 
reversed when the number of selected trips gets too large, which explains the 
slight drop in coverage probability when R increases further to 500. Overall, 
it appears that R = 100 would be a reasonable choice in terms of bias and 
coverage. Efficiency is not a major consideration here because the efficiency 
loss due to R < ki can be recovered through repeated sampling. Table 4 
also shows the performance of the WCR-SRS method with L = 500 and 
R = 100, whose efficiency levels appear similar to the corresponding results 
in Table 2, as expected. For the working independence methods, the WCR 
variance estimate is seriously biased downward and the coverage probability 
is far below the nominal level, confirming the conjecture in Section 4.1 that 
the variance underestimation problem of SwcR is generally worse than that 
of Xl/ in terms of relative bias. For methods based on an estimated covari- 
ance matrix, the WCR variance estimate is not necessarily biased downward 
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but the coverage properties are not good. This behavior is probably due to 
erratic estimates which arise frequently in repeated analyses of subsamples 
when a poorly estimated covariance matrix is used. 

Table 4 does suggest a potential benefit of WCR, that is, the ability to 
produce a sparse subset of trips with weaker serial correlation. This advan- 
tage can be exploited further by selecting trips that are separated from each 
other by a specified amount. Specifically, for a given positive integer S, we 
can choose a trip randomly among the first S + 1 trips and every {S + l)st 
trip from there on, until the end of the sequence. This can be done for each 
subject and the process can be repeated many times as before. This ap- 
proach does lead to good coverage in single outputation when an estimated 
covariance matrix is used together with FSE and the model-based variance 
estimate (results not shown). However, upon moving to WCR, this approach 
exhibits the same strange behavior as seen in Table 4 (results not shown), 
probably for the same reason. 

4.3. WCR with separated blocks (WCR-SB). The previous experiments 
with WCR show that variance underestimation can be a seriousj)roblem, es- 
pecially when the variance of ^wCR is much smaller than that of 9i . The latter 
difference can be reduced by including more observations in each subsam- 
ple, and the question then becomes how to ensure reasonable performance 
of the variance estimate based on a subsample. To this end, we propose 
the following WCR approach involving separated blocks. With FSE in the 
model, the only source of correlation is serial correlation, which is weaker for 
trips farther apart. Thus, with an appropriate amount of separation, trips 
from the same subject can be treated as approximately independent. Let B 
denote the block size (i.e., the number of consecutive trips to be analyzed 
together as a block) and S the separation (i.e., the number of trips used 
to separate the blocks). For a given subject, one possible set of separated 
blocks can be formed by taking the first B trips, skipping the next S trips, 
taking the next B, skipping the next S, and so on. Each group of B trips 
sampled together will be treated as a block and the different blocks will be 
treated as independent in the subsequent GEE analysis. A random shift can 
be added to this sampling process, and the random sampling process can be 
repeated many times as before. 

Preliminary simulation results, as well as those in Table 4, suggest that 
a reasonable block size would be -B = 100. Simple calculations based on (8) 
and (9) show that, with S = 50, the correlation between trips in different 
blocks is typically well below 0.05 in all scenarios considered in this paper. 
With these choices for B and S, the WCR-SB approach is evaluated through 
simulations and the results are presented in Table 5. We focus on working 
independence to avoid numerical stability issues, and only consider the ro- 
bust variance estimate. The method appears to have better coverage when 
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Table 5 

Estimation of /3, the effect of a trip-level covariate, using the WCR-SB approach with 

working independence and the robust variance estimate, implemented with B = 100 (block 

size), S = 50 (separation) and L = 1 (single subsample) or 50. The method is evaluated 

m terms of empirical bias, standard deviation (SD), median standard error (SE), 

coverage probability (CP) of intended 95% confidence intervals and the percentage of 

estimates that are not available (NA). The mean count is fixed at 0.1. In each scenario, 

1,000 replicates are generated, and the NA estimates are counted and then excluded in 

calculating the other summary statistics 



Serial 




Emp. 


Emp. 


Median 


Emp. 




correlation 


L 


bias 


SD 


SE 


CP 


%NA 


Short 


1 


0.00 


0.15 


0.13 


0.94 


0.0 




50 


0.00 


0.12 


0.10 


0.92 


0.0 


Long 


1 


0.00 


0.22 


0.19 


0.92 


0.0 




50 


0.01 


0.20 


0.17 


0.92 


0.0 


Varying 


1 


0.00 


0.18 


0.15 


0.92 


0.1 




50 


0.00 


0.15 


0.13 


0.92 


0.0 



applied to a single set of separated blocks than to the full sample (Table 2), 
apparently owing to a larger number of approximately independent clusters 
(400 vs. 40). Note that the efficiency based on a single outputation is quite 
close to that in a full sample analysis. Considering this and the computa- 
tional demand, the WCR version is implemented with L = 50 repetitions, 
and the resulting WCR-SB estimates are virtually as efficient as the full 
sample estimates. In the case of long-lived serial correlation, better cover- 
age (92%) is achieved here using the working independence method than 
in any of the previous simulations. The improvement (2% over Table 2) is 
not dramatic but still substantial, and similar phenomena have been ob- 
served consistently in other simulation experiments under similar scenarios. 
Table 5 also shows the performance of the WCR-SB method when the serial 
correlation is generated according to (10) rather than (3). The only notice- 
able effect of this change is an intermediate level of efficiency (between the 
extreme cases of short- and long-lived serial correlation) . 

5. Analysis of NTDS data. Our analysis of the NTDS data concerns the 
following elevated 5f-force events: rapid start (longitudinal acceleration > 
0.35g), hard stop (longitudinal deceleration > 0.40^), hard left/right turn 
(lateral deceleration/acceleration > 0.45^), yaw (>5 degrees within 3 sec- 
onds), as well as a composite measure defined as the totality of these 5 types 
of events. Yaw is a measure of correction after a turn and is calculated as 
the absolute change in angle between an initial turn and the correction. For 
each specific g-ioice event, the threshold (in parenthesis) is chosen to allow 
meaningful differences between drivers and different driving conditions to be 
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revealed and estimated with a sufficient number of events. The mean counts 
based on the chosen thresholds are 0.13 (rapid start), 0.18 (hard stop), 0.20 
(hard left turn), 0.15 (hard right turn), 0.04 (yaw) and 0.70 (composite 
measure). The mean count for yaw is relatively low, perhaps too low for the 
methods discussed in this paper, but the corresponding threshold is already 
the lowest for which we have data available. This limitation should be kept 
in mind when interpreting the results. 

From the viewpoint of behavior science, it is reasonable to expect some 
serial correlation because risky driving may be related to the weather and 
the driver's mental and physical conditions (e.g., mood and fatigue). All of 
these factors could result in serial correlation unless they are all included in 
the model, which is impossible in the NTDS due to the lack of this infor- 
mation. To explore the nature of the serial correlation in the NTDS data, 
a marginal model with FSE and all relevant covariates (to be described 
later) is fit for the composite measure under working independence. Pair- 
wise products of standardized residuals are calculated for consecutive trips 
(because the number of all possible pairs is too large). After sorting by gap 
time, these pairs are grouped into 100 bins for further reduction. Within 
each bin, we calculate the median gap time and the mean product of stan- 
dardized residuals, and the results are plotted in Figure 2 together with 
a lowess smooth. Figure 2 suggests that some serial correlation is present in 



o 
O 




1 2 3 

Gap time (day) 



Fig. 2. Serial correlation for the composite measure in the NTDS. Pairwise products of 
standardized residuals are calculated for consecutive trips and, after sorting by gap time, 
grouped into 100 bins. Each circle represents the median gap time and the mean product 
of standardized residuals in a bin, and the solid line is a lowess smooth. 
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Fig. 3. Incidence rates of various g-force events (rapid start, hard stop, hard left/right 
turn, yaw, and the composite measure) as functions of time since licensure, estimated using 
regression splines (one node for every 3 months). 



the data, although a precise characterization of the serial correlation seems 
difficult. Under the GOUP model and using the techniques described in Ap- 
pendix B, the estimated variance components are generally close to 1 (for 
the composite measure and other (7- force events), while the estimate of 7 
varies wildly and depends heavily on the initial value. 

Figure 3 presents the estimated incidence rate (IR) per 100 miles of each 
(7- force event as a function of time since licensure. The IR is defined by (1) 
with rriij replaced by 100, Zi empty, and Xij consisting of indicators of cal- 
endar month and a set of regression splines for tij (one knot for every 3 
months). Calendar month is included in the model to adjust for a possible 
seasonal effect, which may confound with the effect of time since licensure 
because the enrollment of subjects was not uniform with respect to calen- 
dar month, with more subjects enrolled in the fall and fewer in the spring. 
The estimates shown in Figure 3 represent geometric means of the monthly 
estimates. Note that the IR considered here is a marginal quantity which 
involves the marginal intercept v and which should therefore be estimated in 
a model without FSE. The estimates in Figure 3 are obtained under work- 
ing independence, because the GOUP model without FSE is problematic 
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to fit and its performance is not well understood (see Section 3.2). Under 
the regression spline model, there is some ambiguity as to whether the IR 
for the composite measure changes over time. The p- value for the regression 
splines is 0.49 based on the robust variance estimate under working indepen- 
dence, although the validity of this test may be questionable (see Section 3). 
Under the GOUP model with FSE, the IR for the composite measure does 
appear to change over time {p = 0.0014 for the robust variance estimate, 
p < 0.0001 for the model-based variance estimate). The latter observation is 
in contrast to a previous report that risky driving largely remains constant 
[Simons- Morton et al. (2011a)], and the difference is probably due to the 
lower thresholds used here, which lead to more events and perhaps more 
relevant information. Despite the ambiguity about statistical significance, 
the main conclusion from our marginal analysis is fairly consistent with the 
findings of Simons-Morton et al. (2011a). The latter reference reports IRs 
for adults/parents that are consistently and substantially lower than those 
for novice teenagers driving the same vehicles. In Figure 3, the estimated 
IR of the composite measure for teenage drivers increases over the first 6 
months, declines over the next few months, and then increases again, sug- 
gesting the establishment of a risky driving style. This general impression 
is reinforced by the plots for the specific g-force events, which show some 
temporal changes but no clear trends. 

As noted by a referee, some instances of risky driving (e.g., mistakes due to 
the lack of skills) may depend more on the amount of practice, which can be 
quantified by accumulated mileage, than on time since licensure. This raises 
the question of which measure is more appropriate to adjust for. Accumu- 
lated mileage is a measure of driving experience, while time since licensure 
reflects both experience and maturity. From a public health perspective, we 
believe that time since licensure is more relevant to policy makers. Nonethe- 
less, all analyses in this section have been repeated with time since licensure 
replaced by accumulated mileage, and the results are very similar and there- 
fore omitted. With regard to Figure 3, this similarity suggests that persistent 
risky driving among teenage drivers is not only due to inexperience but also 
to immaturity (i.e., aspects of adolescent development that motivate novice 
young drivers to seek excitement and under-recognize risks). 

Each g-force event is further analyzed in a larger model that simultane- 
ously adjusts for the following covariates: driver's gender, risky friends, time 
of day, passenger condition, calendar month, and time since licensure. The 
risky behavior of a teenage driver's friends is assessed using a 7-item index 
asking "How many of your friends would you estimate. . . smoke cigarettes, 
drink alcohol, get drunk at least once a week, use marijuana, drive after hav- 
ing two or more drinks in the previous hour, exceed speed limits, and do not 
use safety belts (none, a few, some, most, all)." The assessment was made at 
4 time points (baseline, 6, 12 and 18 months), the 4 scores were averaged for 
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each driver, and the average score was then dichotomized according to the 
median spht among all drivers in the study. Night driving was determined 
from video data by visual observation of the ambient natural lighting at the 
start of the trip. Late night was defined as 10 pm to 6 am, and night trips 
that did not start at late night were considered early night trips. The pres- 
ence and relative age (adult or teen) of each passenger was determined by 
the coders examining the video data. In summary, gender and risky friends 
enter the marginal model as subject-level binary covariates, time of day is 
a trip- level covariate with 3 levels (day, early night and late night), passenger 
condition is also a trip-level covariate with 3 levels (none, teens, at least one 
adult), calendar month is a trip-level covariate with 12 levels, and time since 
licensure is represented by the same set of regression splines as before. No 
signs of model misspecification are observed in residual plots (not shown). 

Table 6 presents estimates of incidence rate ratios (IRR) that quantify the 
association of each g'-force event with the aforementioned covariates (except 
calendar month and time since licensure). An IRR is just the result of ex- 
ponentiating the corresponding regression coefficient in the marginal model. 
The estimates are obtained using standard GEE methods (working inde- 
pendence or the GOUP model, robust or model-based variance estimate) 
as well as the proposed WCR-SB method (working independence, B = 100, 
S = 50, L = 50). All of these methods include FSE. In a sensitivity analysis, 
a different separation size (S* = 100) for the WCR-SB method is used to 
produce similar results, which are therefore omitted. Since the point esti- 
mates depend primarily on the working correlation structure, only 2 sets of 
point estimates are shown in Table 6. For the subject-level covariates, the 
LS method is used to obtain 2 sets of 95% confidence intervals (for working 
independence and the GOUP model). For the trip- level covariates. Table 6 
presents 4 sets of 95% confidence intervals corresponding to the robust vari- 
ance estimate and the WCR-SB method under working independence as well 
as the robust and model-based variance estimates under the GOUP model. 
Table 6 contains some unrealistically large values (>100, labeled as oo) for 
the IRRs associating risky friends and gender with yaw and rapid start, 
probably due to very few events in the subgroup of teenage drivers acting 
as the denominator. 

Despite some numerical differences between the different methods, the re- 
sults in Table 6 indicate clearly that teenage risky driving is not associated 
with gender, positively associated with risky friends, negatively associated 
with early night, and negatively associated with the presence of passengers 
(more strongly for adults than for teens). The evidence is not conclusive 
with regard to late night, whose association with risky driving is significant 
in some cases but not in others. The association of risky friends with risky 
driving points to potentially destructive effects of risk-accepting social norms 
and social identity on risky behavior. Both the perception and the actuality 
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Table 6 

Analysis of NTDS data using standard GEE methods (working independence or the 

GOUP model, robust or model-based variance estimate) and the proposed WCR-SB 

method (working independence, _B = 100, S = 50, L = 50). Each outcome variable (rapid 

start, hard stop, hard left/right turn, and a composite measure) is analyzed separately in 

a marginal model that simultaneously adjusts for gender, risky friends, time of day, 

passenger condition, calendar monthl and time since licensure (the last two as 

confounders) . The results are summarized m terms of incidence rate ratios (IRRs) and 

95% confidence intervals. The symbol cxd denotes a value that is unrealistically large 

(>100) 

GOUP model Working independence 



Covariate Comparison IRR Model-based Robust IRR Robust WCR-SB 

Composite measure {mean count = 0.70) 

Gender Male vs. female 1.16 (0.67,1.99) 1.07 (0.64,1.79) 

Risky friends More vs. fewer 1.91 (1.11,3.29) 1.82 (1.08,3.05) 

Time of day Early night vs. day 0.64 (0.61,0.68) (0.59,0.71) 0.73 (0.64,0.83) (0.69,0.78) 

Late night vs. day 0.76 (0.69,0.84) (0.66,0.88) 0.89 (0.68,1.16) (0.80,1.00) 

Passengers Teen vs. none 0.81 (0.78,0.85) (0.74,0.90) 0.78 (0.72,0.86) (0.75,0.83) 

Adult vs. none 0.39 (0.34,0.45) (0.34,0.46) 0.32 (0.26,0.40) (0.28,0.40) 

Rapid start (mean count = 0.13) 

Gender Male vs. female 0.12 (0.00,5.69) 0.12 (0.00,5.86) 

Risky friends More vs. fewer 12.33 (0.26,oo) 11.50 (0.26, oo) 

Time of day Early night vs. day 0.89 (0.83,0.95) (0.70,1.14) 0.89 (0.74,1.07) (0.83,0.95) 

Late night vs. day 0.89 (0.79,1.00) (0.68,1.15) 0.93 (0.77,1.12) (0.78,1.03) 

Passengers Teen vs. none 0.83 (0.78,0.88) (0.68,1.01) 0.81 (0.70,0.92) (0.75,0.87) 

Adult vs. none 0.32 (0.25,0.41) (0.21,0.50) 0.34 (0.26,0.44) (0.26,0.44) 

Hard stop {mean count =0.18) 

Gender Male vs. female 0.78 (0.44,1.40) 0.74 (0.42,1.30) 

Risky friends More vs. fewer 1.43 (0.80,2.56) 1.35 (0.77,2.38) 

Time of day Early night vs. day 0.68 (0.63,0.73) (0.60,0.77) 0.70 (0.62,0.80) (0.65,0.75) 

Late night vs. day 0.71 (0.61,0.82) (0.55,0.91) 0.70 (0.56,0.87) (0.61,0.80) 

Passengers Teen vs. none 0.90 (0.84,0.96) (0.82,1.00) 0.88 (0.79,0.97) (0.81,0.95) 

Adult vs. none 0.51 (0.42,0.63) (0.41,0.64) 0.38 (0.29,0.49) (0.32,0.47) 



of having risk-taking friends could contribute to the perceived acceptabihty 
of risky behavior. The association of early night with risky driving suggests 
that teens do recognize the danger of night driving (at least partially) and 
drive more carefully (or rather, less carelessly) at night. There is no contra- 
diction between this finding and the apparent ambiguity about late night 
because late night trips often take place under unusual circumstances. Fi- 
nally, it is worth noting that passengers, especially adult passengers, appear 
protective with respect to risky driving. Further research is warranted to 
confirm, better characterize and utilize such protective effects. 
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Table 6 
(Continued) 





Comparison 




GOUP model 


Working independence 


Covariate 


IRR 


Model-based Robust 


IRR 


Robust WCR-SB 




Hard left 


turn {mean count = 0.20) 






Gender 


Male vs. female 


1.36 


(0.72,2.55) 


1.29 


(0.69,2.38) 


Risky friends 


More vs. fewer 


2.41 


(1.28,4.53) 


2.20 


(1.19,4.07) 


Time of day 


Early night vs. day 


0.53 


(0.49,0.58) (0.45,0.63) 


0.64 


(0.52,0.79) (0.58,0.71) 




Late night vs. day 


0.78 


(0.67,0.92) (0.63,0.97) 


1.00 


(0.68,1.46) (0.83,1.18) 


Passengers 


Teen vs. none 


0.81 


(0.75,0.88) (0.70,0.94) 


0.76 


(0.68,0.84) (0.70,0.82) 




Adult vs. none 


0.38 


(0.30,0.49) (0.28,0.53) 


0.32 


(0.24,0.43) (0.25,0.40) 




Hare 


I right 


turn {mean count = 0.15) 






Gender 


Male vs. female 


1.39 


(0.72,2.69) 


1.31 


(0.69,2.49) 


Risky friends 


More vs. fewer 


2.04 


(1.05,3.95) 


1.92 


(1.01,3.65) 


Time of day 


Early night vs. day 


0.63 


(0.58,0.67) (0.50,0.79) 


0.69 


(0.55,0.85) (0.62,0.76) 




Late night vs. day 


0.83 


(0.74,0.93) (0.46,1.48) 


0.92 


(0.58,1.46) (0.75,1.14) 


Passengers 


Teen vs. none 


0.64 


(0.60,0.69) (0.54,0.76) 


0.62 


(0.55,0.70) (0.56,0.69) 




Adult vs. none 


0.26 

Yaw 


(0.20,0.33) (0.18,0.37) 
{mean count = 0.04) 


0.22 


(0.16,0.31) (0.17,0.30) 


Gender 


Male vs. female 


0.07 


(0.00, oo) 


0.07 


(0.00, oo) 


Risky friends 


More vs. fewer 


oo 


(0.15, oo) 


oo 


(0.17, oo) 


Time of day 


Early night vs. day 


0.92 


(0.76,1.10) (0.65,1.29) 


0.94 


(0.77,1.15) (0.80,1.11) 




Late night vs. day 


0.88 


(0.63,1.23) (0.42,1.86) 


0.92 


(0.59,1.45) (0.71,1.14) 


Passengers 


Teen vs. none 


1.19 


(1.02,1.39) (0.90,1.57) 


1.17 


(1.00,1.37) (1.02,1.33) 




Adult vs. none 


0.54 


(0.32,0.92) (0.21,1.39) 


0.44 


(0.27,0.73) (0.27,0.68) 



6. Discussion. Even with a great variety of metiiods available for lon- 
gitudinal data analysis, it can still be difficult to analyze longitudinal data 
in long sequences, especially when the serial correlation is strong and long- 
lived. In this paper, we examine standard GEE methods and propose new 
ones for marginal analysis of longitudinal count data in a small number of 
very long sequences. The methods are evaluated and compared in simulation 
experiments mimicking the NTDS, and the main findings can be summarized 
as follows. We consider the use of FSE in this particular situation, a sim- 
ple technique with important practical implications. It allows the effects of 
subject-level covariates to be estimated easily from a linear regression anal- 
ysis for the estimated FSE, and it also helps with trip-level covariates by 
removing the correlation due to population heterogeneity. For trip-level co- 
variates, we find that a standard GEE analysis under working independence 
can lead to inefficient estimates and serious undercoverage, and that both 
problems can be alleviated by incorporating a properly specified correlation 
structure. The latter approach works well for large counts but not for small 
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counts, mainly because the serial correlation is hard to estimate with small 
counts. We therefore explore an alternative approach (WCR) for the case 
of small counts. The original version of WCR and an extension to simple 
random sampling seem unsatisfactory, however, because of numerical insta- 
bility in repeated analyses of small samples and a bias magnification effect 
of WCR that results in variance underestimation. To address these issues, 
we propose an WCR-SB approach that involves separated blocks and that 
performs better than all of the previously considered methods (WCR and 
standard GEE). 

In Table 6 the WCR-SB analyses are not dramatically different from the 
other analyses. This is reassuring to the scientists, but it also suggests that 
the NTDS data set is not ideal for demonstrating the advantage of the WCR- 
SB approach. To put the latter point in perspective, we note that better 
performance in the frequentist sense does not imply better results in every 
realization. The performance of the WCR-SB approach has been evaluated 
in simulation experiments which are, in fact, designed to mimic the NTDS. 
Most of the uncertainty in designing the simulation experiments is associated 
with the length of the serial correlation, which is very difficult to estimate 
with small counts, as shown in Table 3. It is certainly possible that the serial 
correlation in the NTDS is not sufficiently long-lived for the new method to 
make a material difference. In that case, a better example than the NTDS 
might be a proposed follow-up study that involves 100 teenage drivers to 
be followed for at least 3 years. The larger amount of data from the latter 
study will afford a better understanding of both the nature and the length of 
the serial correlation. Furthermore, as teenage drivers gain experience and 
perspective over time, their driving behavior may become less haphazard 
and more stable, in which case the serial correlation will gradually become 
longer-lived toward the end of the (longer) study duration. Thus, in terms of 
the length of the serial correlation, the follow-up study will provide a better 
opportunity than the NTDS to demonstrate the advantage of the WCR-SB 
approach. 

The WCR-SB approach is designed for longitudinal data in a small num- 
ber of long sequences. The strength of this approach relative to standard 
GEE methods is the ability to handle strong and long-lived serial correla- 
tion when the mean count is low. The weaknesses of the WCR-SB approach 
include increased computational demand (relative to standard GEE meth- 
ods) and the need for information about the length of the serial correlation 
(in order to specify the separation size S). Because 7 can be very difficult to 
estimate, one may need to consult the subject-matter scientist or perform 
a sensitivity analysis with different values of 5", as we did in analyzing the 
NTDS data (see Section 5). 

As mentioned earlier at the end of Section 3.1, we have actually explored 
several existing alternatives to the robust variance estimate in the present 
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situation. No appreciable improvement has been achieved using a standard 
bootstrap procedure (i.e., samphng subjects with replacement) and jack- 
knife methods [Paik (1988), Lipsitz, Laird and Harrington (1990)]. A block 
bootstrap procedure [Kiinsch (1989)] appears to work well for short-lived 
serial correlation but not for long-lived serial correlation. In a simple setting 
that permits closed-form calculations of the finite-sample target of the ro- 
bust variance estimate (with sample quantities replaced by their population 
counterparts) under working independence, we have found that the target 
can be far below the sampling variance of the point estimate observed in 
simulations, suggesting that the asymptotic theory may not provide a rea- 
sonable approximation in this situation. Given that, it seems unlikely that 
a substantial improvement can be made using the available bias correction 
methods [e.g., Mancl and DeRouen (2001)]. Another possible approach to 
variance estimation is window subsampling [Sherman (1996), Heagerty and 
Lumley (2000), Oman et al. (2007)]. A practical difhculty with this ap- 
proach is specification of the window size, which has been found to have 
a large impact on the variance estimate. Heagerty and Lumley (2000) have 
suggested taking the maximum among variance estimates based on several 
window sizes, but it remains unclear how to choose the set of window sizes 
for the maximization. Maximizing over a large set of window sizes can lead 
to a very conservative variance estimate, as the maximum among several 
underestimators need not be an underestimator itself. 

Under the WCR approach, we have considered various sampling schemes 
based on time. An interesting possibility, suggested by a referee, is to sam- 
ple trips using a mechanism that involves other covariates than time and 
possibly the outcome. In the latter case, appropriate adjustments will be 
necessary to account for the outcome-dependent nature of the subsample. 
Further research is needed to explore the potential benefits of the more 
sophisticated sampling schemes. 

This article has been focused on valid inference in the sense of (nearly) 
correct coverage. We have not discussed the issue of generalization from 
a sample of subjects to the target population, which is always important 
and especially so when the number of subjects is small. Such generalizations 
should be easier to justify when the within-subject variability is large relative 
to the between-subject variability, which appears to be the case in the NTDS. 

APPENDIX A: IRLS ESTIMATION OF a FOLLOWING A GEE 

ANALYSIS WITH FSE 

Write v_= {I'l,- ■ ■ , Vn)' and P = (z?i , . . . , Vn)' , and let Spijy denote the vari- 
ance estimate from the GEE analysis. As the notation indicates, Spi^ es- 
timates the conditional variance Spi^ = var(P|i/) because the GEE model 
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with FSE is conditional on v_. The marginal variance of P is given by 

Sp = var(P) = var{E(2|ii)} + E{var(P|z^)} th cr^I„ + ES^ii,, 

where I^ denotes the n x n identity matrix. A natural estimate of Sp can 
be obtained as 

(14) % = a^an + ^v\u 

provided a reasonable estimate a^ is available. This suggests the following 
IRLS algorithm. We start by obtaining an initial estimate of a, say, the LS 
estimate. Then we proceed to the following steps: 

Step 1 . Calculate a moment estimate of o"^ , say, the mean squared resid- 
ual minus the mean diagonal element of Sp; 

Step 2. Substitute the estimate of o"^ into (14) and re-estimate a from 
a weighted least squares analysis based on the updated I]p. 

These steps can be iterated a number of times or until convergence. 

APPENDIX B: ESTIMATION OF PARAMETERS IN THE 

COVARIANCE MATRIX 

Let /2jj denote the fitted values from a preliminary GEE analysis without 
FSE. Then (6) shows that a moment estimate of the sum of a^, (j^ and cXg 



VS lliclli CI iilUiliCiiU COUliliCllC Ul tliC OUlil Ul C "" '^""^ '^ 

can be obtained as 



log 






where A^ = X]"=i ^i is the total number of observations. Further, equation (7) 
suggests that o"^, o"^ and 7 can be estimated from a nonlinear regression 
analysis with (Yij — ^ij){Yiji — Jly') / (Jlijjliji) as the response variable and 
with mean function 

expjo-fe -Fcr^exp(-7|tij -%'!)} - 1. 

The above quantities could be modified using bias correction adjustments 
[e.g., Davis, Dunsmuir and Wang (2000), Section 3.2] and/or smoothing 
techniques prior to the nonlinear regression analysis. However, such modifi- 
cations have not been found helpful in our experiments. Ideally, the nonlinear 
regression analysis should include all pairs of trips within subjects. However, 
this may be impractical for the NTDS data because a subject with thou- 
sands of trips would contribute millions of pairs, resulting in too many data 
points. As a compromise, we propose to base the nonlinear regression anal- 
ysis on two types of pairs: consecutive trips (with j' = j + 1) and symmetric 
trips (with j + j' = ki + 1). The pairs of consecutive trips are informative 
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about short-lived serial correlation, while the pairs of symmetric trips help 
characterize the overall correlation over the entire range of the gap time 

\*^ij ^2j/' I • 

Similar techniques can be used for a preliminary GEE analysis with fixed 
subject effects. With Jj-ijib^ denoting the fitted values with fixed subject ef- 
fects, the sum of o"^ and a^ can be estimated by 



log 






while the separate values of o"^ and 7 can be estimated from another non- 
linear regression analysis with {Yij -'jlij\i,^){Yij: -p,iji\f,^) / {'fiij\f,fiij,\i,^) as the 
response variable and with mean function 

exp{o-^exp(-7|tjj - tjj/|)} - 1. 

These are justified by (8) and (9), respectively. 

Initial values of the unknown parameters are required for fitting the above 
nonlinear regression models. In our experience, a reasonable way to obtain 
initial values appears to be the following linear regression method. Consider 
the case with FSE, and note that equation (9) can be rewritten as 

(15) log log E<^ ' h 1 ^ = logo-c - 7|tij - tij'l 

I fJ-ij\b,fJ-ij'\b, J 

To take advantage of this relationship, we can allocate the pairs of trips into 
bins according to the value of \tij — tij'\ and treat each bin as approximately 
homogeneous with respect to the gap time. Within each bin, we can re- 
place fJ-ij\bi with Jj-ijibi and expectation with sample average on the left-hand 
side of (15), replace \tij — tiji\ with a typical value (say, the median) on the 
right-hand side, and run a linear regression analysis with each bin as a data 
point. Initial estimates of cr^ and 7 can then be obtained by exponentiat- 
ing the estimated intercept and negating the estimated slope, respectively. 
This linear regression approach would not work for the case without FSE, 
for which we could use as initial estimates the final estimates from a GEE 
analysis with FSE. 
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